Selection of Minimum Subsets of Single Nucleotide Polymorphisms to Capture Haplotype Block Diversity

نویسندگان

  • Hadar I. Avi-Itzhak
  • Xiaoping Su
  • Francisco M. de la Vega
چکیده

We present a simple numerical algorithm to select the minimal subset of SNPs required to capture the diversity of haplotype blocks or other genetic loci. This algorithm can be used to quickly select the minimum SNP subset with no loss of haplotype information. In addition, the method can be used in a more aggressive mode to further reduce the original SNP set, with minimal loss of information. We demonstrate the algorithm performance with data from over 11,000 SNPs with average spacing of 6 to 11 Kb, across all the genes of chromosomes 6, 21, and 22, genotyped on DNA samples of 45 unrelated African-Americans and 45 Caucasians from the Coriell Human Diversity Collection. With no loss of information, we reduced the number of SNPs required to capture the haplotype block diversity by 25% for the African-American and 36% for the Caucasian populations. With a maximum loss of 10% of haplotype distribution information, the SNP reduction was 38% and 49% respectively for the two populations. All computations were performed in less than 1 minute for the entire dataset used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies.

It is widely hoped that the study of sequence variation in the human genome will provide a means of elucidating the genetic component of complex diseases and variable drug responses. A major stumbling block to the successful design and execution of genome-wide disease association studies using single-nucleotide polymorphisms (SNPs) and linkage disequilibrium is the enormous number of SNPs in th...

متن کامل

Association of IGF-I Gene Polymorphisms with Carcass Traits in Iranian Mehraban Sheep Using SSCP Analysis

Molecular genetics selection on individual genes is a promising method to genetically improve economically important traits in livestock. The insulin like growth factor-I (IGF-I) gene may play important roles in growth of multiple tissues, including muscle cells, cartilage and bone. The objectives of the present study were the estimate the haplotype frequencies of the IGF-I gene polymorphisms i...

متن کامل

A sparse marker extension tree algorithm for selecting the best set of haplotype tagging single nucleotide polymorphisms.

Single nucleotide polymorphisms (SNPs) play a central role in the identification of susceptibility genes for common diseases. Recent empirical studies on human genome have revealed block-like structures, and each block contains a set of haplotype tagging SNPs (htSNPs) that capture a large fraction of the haplotype diversity. Herein, we present an innovative sparse marker extension tree (SMET) a...

متن کامل

Efficient Algorithms for SNP Haplotype Block Selection Problems

Global patterns of human DNA sequence variation (haplotypes) defined by common single nucleotide polymorphisms (SNPs) have important implications for identifying disease associations and human traits. Recent genetics research reveals that SNPs within certain haplotype blocks induce only a few distinct common haplotypes in the majority of the population. The existence of haplotype block structur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2003